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Remarks 

Regarding the requested amendments to Specification 

In the Final Office Action the Examiner objected to the Specification (see point 1, page 2 of the Office 
Action) because of the informality that the information in the first paragraph of the application needed to 
be updated to indicate that the parent 768 and '068 applications are now abandoned. The presently 
requested amendment to the Specification cures this defect and informality. In addition, the new 
paragraph now also correctly indicates that application 09/623, 068 is a National Stage Entry under 35 
U.S.C. 371 of, and claims priority from PCT/US99/04376 (see USPTO File Wrapper: Query Control 
Form dated 3/17/2004 and Correction of Bibliographic Data on page 1 of Specification dated 
3/26/2004). 

Regarding new paragraph [0026.1] that is requested to be added to the Specification. 

The addition of this paragraph is requested to provide more literal antecedent basis in the Specification 
for a limitation that is already present in pending, previously allowed claims. As noted in MPEP 
2173.05(e), it is not necessary that claim terms or phrases have literal antecedent basis in the 
specification. Nevertheless applicants respectfully request the addition of this paragraph. The paragraph 
deals with conventional linkage study techniques that are essentially one-dimensional and essentially 
one-dimensional marker panels. "An essentially one-dimensional panel of markers for a linkage study" is 
any conventional (at the time of filing of the first priority document, US Provisional 60/76, 102) linkage 
study marker panel chosen to attempt to achieve one-dimensional closeness and linkage of one or 
more panel markers and the (or a) sought trait-causing polymorphism. 

The subject matter of the requested paragraph [0026.1] is not "new matter". The basis for the addition of 
paragraph [0026.1] to the Specification is in MPEP 2163.06 and in MPEP 2163.07. The first paragraph 
of MPEP 2163.06 states: "information contained in., the specification ..as filed may be added to any 
other part of the application without introducing new matter". And relevant sections of MPEP 2163.07 
(Amendments to Application Which Are Supported in the Original Description) are I. REPHRASING and 
Inherent Function, Theory or Advantage. 
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It is well-known in the art of linkage studies that linkage study "markers are chosen based on a principle 
of one-dimensional closeness" in the chromosomal location dimension. This chromosomal closeness or 
nearness is cited in the present application at bottom [0009], top [001 1] and bottom [0016]. Specifically, 
"By establishing linkage, especially strong linkage, between a known marker and an unknown gene [or 
trait-causing polymorphism] it is possible to locate the gene [or trait-causing polymorphism] near to the 
chromosomal location of the known marker" And "Linkage studies are a method of establishing linkage 
between a marker and a gene [or trait-causing polymorphism] or genes. " "Strong positive evidence for 
linkage of the markers (from the scanned chromosomal region) to a gene or genes responsible for a 
characteristic or trait is strong evidence that a trait-causing gene or genes is located within the 
chromosomal region." (The application uses the terms "gene" and "trait-causing polymorphism" 
interchangeably, see [0005] and Definitions Section [0059].) 

The concept of chromosomal closeness or nearness is also present in the term "linkage". See enclosed 
definitions of linkage. Specifically the Genome Glossary of the U.S. Government's Human Genome 
Project (http://www.ornl.gOv/sci/techresources/Human_Genome/glossary/glossary.shtml#L) dated 
1/30/2007 defines "Linkage The proximity of two or more markers (e.g., genes, RFLP markers) on a 
chromosome; the closer the markers, the lower the probability that they will be separated during DNA 
repair or replication processes (binary ftssion in prokaryotes, mitosis ormeiosis in eukaryotes), and 
hence the greater the probability that they will be inherited together." See also the enclosed, marked 
pages of the Encyclopedia of Molecular Biology and Medicine (1996) editor Robert A. Meyers. On page 
377, volume 3, Linkage (of genes) is defined as: "The tendency of genes to be inherited together based 
on proximity within the same chromosome". And on page 222 Volume 1 under Linkage Analysis the 
Encyclopedia says: "When a gene., .is located on the same chromosome pair as marker and close to it 
(i.e., when gene and marker are linked).." 

Similarly, for example, the bottom of [0035] of the present application refers to one-dimensional 
closeness in a one-dimensional view (or perspective). And non-limiting examples of conventional 
essentially one-dimensional linkage study scanning techniques given in the application (see for example 
[0020]) use a strategy that attempts to locate at least one marker near the (or a) sought trait-causing 
polymorphism (in a chromosomal region) by distributing markers approximately evenly (along the length 
of the chromosomal region). Another non-limiting example of a conventional essentially one- 
dimensional linkage study, based on one-dimensional closeness, is the example given in mid [0027], 
specifically the TDT association study of Risch and Merikangas. The TDT association study of Risch 
and Merikangas is based on the optimal assumption of the analyzed allele being the disease allele (i.e., 
the assumption of a study marker being the disease-causing polymorphism). (As is well-known, in the 
case of association studies, markers are chosen to attempt to achieve closeness, linkage and linkage 
disequilibrium between one or more of the markers and the (or a) trait-causing polymorphism.) 
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The favoring in conventional linkage study techniques for markers with least common allele frequencies 
near 0.5 is discussed in [0026]. However, the application repeatedly emphasizes that comparatively little 
attention is paid to the allele frequency dimension, see for example, [0023], [0024], middle [0026], top 
[0035]. This comparatively little attention is because (as stated in new paragraph [0026.1] and the 
Amendment/Response of December 2004) "Conventional, essentially one-dimensional marker panels 
are,.,, not chosen based on using the principle of the similarity of marker allele frequency and possible 
trait-causing polymorphism allele frequency to increase the power of an association-based linkage test 
to detect evidence for linkage". 

This limitation is present in pending, previously allowed claims and was discussed on page 11 of the 
Amendment/Response of December 2004. For the Examiner's convenience, that discussion is 
reproduced here and stated the following: see, e.g. [0019], [0020], top [0024] and [0035] (i.e., 
conventional linkage study techniques are essentially one dimensional, focus on the dimension of 
chromosomal location but give little attention to the dimension of allele frequency) and see, e.g. [0308] 
"It is well known that increased disequilibrium between a marker and linked disease locus increases 
evidence for linkage provided by association-based linkage tests such as the TDT. However, what has 
not been recognized is that the specific allele frequencies of the marker locus can also have an 
enormous impact on the strength of evidence for linkage." And see a rendition of the principle that the 
inventor discovered: e.g. [0285] i.e., the power of association-based tests for linkage are increased as 
the allele frequencies of the disease-causing (or trait-causing) allele of a bi-allelic gene (or 
polymorphism) and a positively associated allele of a linked bi-allelic marker become similar in 
magnitude. That is, conventional (essentially one-dimensional) techniques are not based on using 
similarity of marker allele frequency and possible trait-causing polymorphism allele frequency to 
increase the power of an association-based linkage test to detect evidence for linkage . 
As stated in the Amendment/Response of December 2004, the reason for the comparative inattention to 
allele frequency is because the above principle (of the similarity of marker allele frequency and possible 
trait-causing polymorphism allele frequency increasing the power of association-based linkage tests) 
was discovered by the inventor and was unrecognized by conventional techniques; see, for example, 
[0308] and top [0285]. The principle was, for example, unrecognized at the time of the conventional, 
essentially one-dimensional TDT association study of Risch and Merikangas in September of 1996, see 
[0027]. There is nothing mentioned about the principle in the Risch and Merikangas reference or in the 
Kruglyak reference [0026] (Nature Genetics September 1997). The inventor's original manuscript was 
first submitted for publication in December of 1996 (see [0285]), but was not published until 1998. 
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Regarding the amendments to previously pending claims 
The limitation "wherein the group of two or more covering markers is not an essentially one- 
dimensional panel of markers for a linkage study, wherein the essentially one-dimensional panel is a 
panel not based on using similarity of marker allele frequency and possible trait-causing polymorphism 
allele frequency to increase the power of an association-based linkage test to detect evidence for 
linkage" has been expressly added to each of the previously pending, allowed independent claims 6, 16, 
39, and 44 and to previously pending claim 61 . The limitation was already present in previously 
pending, allowed dependent claims 7, 17, 40, 45 and in previously pending dependent claim 62. Each of 
currently amended claims 6, 16, 39, and 44 continue to be within the scope of previously allowed claims 
6, 16, 39, and 44. The limitation is discussed in detail above in relation to new paragraph [0026.1]. The 
limitation was also previously discussed in the Amendment/Response of December 2004, p. 11, see 
above. 

Claim 15 has been cancelled and a similar new claim, new independent claim 81 has been 
submitted. Claim 15 was not in exact conformity with the description at [0169] and [0170]. New claim 81 
is in closer conformity with the description at [0169] and [0170]. The limitation "wherein the localizing 
uses a technique or techniques that detects gradients, wherein the detection technique or techniques 
uses a gradient along the allele frequency dimension" was present in previously allowed claim 15. The 
limitation was discussed in the Remarks on page 13 of the December 2004 Amendment/Response. 
Those Remarks state: "Regarding support .... see, e.g. [0169], [0170], [0171]. See also, e.g. [0285] 
through [0289] inclusive and [0296] which describe increases in power along the allele frequency 
dimension, i.e. one or more gradients in power along the allele frequency dimension" (Power and 
statistical evidence for linkage are closely related or equivalent, see [0286].) 

The invention of claim 15 is a two-dimensional linkage study technique and is based (as are all the other 
claimed inventions) on the inventor's newly discovered principle of the similarity of marker allele 
frequency and possible trait-causing polymorphism allele frequency increasing the power of association- 
based linkage tests. Conventional linkage studies do not make use of a gradient in statistical evidence 
for linkage or power along the allele frequency dimension. Applicants respectfully submit that the 
invention is novel and unobvious by virtue of the limitation "wherein the localizing uses a technique or 
techniques that detects gradients, wherein the detection technique or techniques uses a gradient along 
the allele frequency dimension". 
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New claim 82 depends from new claim 81 . Like most of the other claimed inventions, the marker pane! 
used in the invention of claim 82 is expressly defined as "not an essentially one-dimensional panel for a 
linkage study". Put another way, new claim 82 comprises steps a), b), c), d) and e) of the claimed 
process of claim 6. So claim 82 is equivalent to: A process for identifying one or more bi-allelic 
markers linked to a bi-allelic trait-causing polymorphism in a species of creatures, comprising 
acts of: a), b), c) d) and e) of claim 6, further comprising the act of: f) localizing the trait-causing 
polymorphism to the chromosomal location-least common allele frequency (CL-F) location of 
one or more markers that show evidence for linkage based on the calculations of act e), wherein 
the localizing uses a technique or techniques that detects gradients, wherein the detection 
technique or techniques uses a gradient along the allele frequency dimension. 

Claim 57 has been amended and the identifier u a) n has been eliminated from the claim. Applicants 
respectfully submit that the amendment is a mere informality and does not change the scope of the 
claim. Applicants believe that the amendment increases claim clarity as the identifiers "a)° and u b) n are 
used in independent claim 39 from which claim 40 and claim 57 depend. 

Claims 61-67 were rejected in the Final Office Action of Oct 2006 as indefinite under 35 USC 112, 2 nd 
paragraph. The rejection referred to MPEP 2172.01 and the omission of critical steps. 
An Examiner Interview regarding the rejection of these claims was conducted on December 21, 2006 
and applicants respectfully offered the following arguments with respect to patentability of these claims. 
Applicants argued that MPEP 2172.01 also cites MPEP 2164.08(c), which states "Features which are 
merely preferred are not to be considered critical". And applicants presented support in the description 
for practicing the claimed processes without the minimal steps recited in claim 39. 

More specifically, applicants cited parts of the description which indicate the processes can be practiced 
by a computer without the minimal steps recited in claim 39. (It is possible, for example, to obtain 
genotype data/sample allele frequency data from a stored computer file without working directly with 
chromosomal DNA.) Paragraph [0173] states: "It is also possible for a computer program to execute any 
one of the steps or step-like parts ofProcess#1". And paragraphs [0208] and [0209] recite an 
appropriately programmed computer as an example of a means to obtain genotype data/sample allele 
frequency data of step d) of Process#1 . And paragraph [0231] refers to a process to obtain genotype 
data/sample allele frequency data similar to the data of d) of Process #1 . 
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The claimed processes in claims 61-67 are essentially processes for practicing the step-like part of 
obtaining genotype data/sample allele frequency data recited in d) of Process#1, see [0155] and [0152] 
and [0154]. Applicants respectfully submit that these processes are novel and unobvious because the 
group of two or more markers which systematically cover a CL-F region (which are also not an 
essentially one-dimensional panel) are essentially novel and unobvious. 

Applicants respectfully submit that the terms "genotype data" and "sample allele frequency data" are 
definite and are well-known in the art. In addition, the related term "genotype data/sample allele 
frequency data" is specifically defined in the application (see [0148] and related paragraph [0147]). 
Applicants respectfully submit that the act of "obtaining" such data for use in a linkage study is definite 
(see [0166]). 

McMahon, et. al. Integrating Clinical and Laboratory Data in Genetic Studies of Complex Phenotypes; 
A Network-Based Data Management System (American Journal of Medical Genetics, 81 : 248-256 
(1998)) is a reference that was published about the time of filing of the parent PCT application and 
Provisional priority applications. The reference describes a computer-based system for storing, 
managing and accessing genetic data, including genotype data. Some marked pages from the 
reference are submitted to the Examiner as illustrative of knowledge in the art of using previously 
collected genotype data in linkage studies. 

The inventor's paper in the Annals of Human Genetics (see [0029] and footnote 1 1) is an integral part of 
the application and is incorporated by reference into the application (see [0333]). The first page of this 
paper refers to three linkage studies (Julier et. al., Spielman, et. al., and Thomson, et. al.) that used 
previously collected genetic data, including genotype data, from Genetics Analysis Workshop 5 (GAW5) 
families. Some marked pages from the inventor's paper in the Annals of Human Genetics (AHG98) and 
from the Julier, Spielman, and Thomson references are submitted to the Examiner as illustrative of 
knowledge in the art of using previously collected genotype data in linkage studies. 

Some further facts about the Examiner Interview As related to the Examiner in the December 21 , 
2006 interview, the Kruglyak reference (Nature Genetics September 1997 [0026]) is an example of a 
conventional, essentially one-dimensional linkage study technique or approach. As related to the 
Examiner in the December 2006 interview, the inventor's original unpublished manuscript (first 
submitted for publication in December of 1996, see [0285]) was, at first, rejected for publication. The 
unpublished manuscript contained concepts about the unrecognized importance of allele frequency (see 
for example mid to bottom [0031], [0290], [0293], [0300]). 
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New, dependent claims 83-90 have been submitted These new claims are similar to previously 
allowed claims and all have limitations present in previously allowed claims. More specifically new 
claims 83, 86 and 89 are similar to allowed, pending claims 17, 46 and 69 respectively. But these new 
claims do not have the limitation, "wherein each bi-allelic covering marker is an exact, true bi-allelic 
marker. As previously discussed on p. 13 of the Amendment/Response of May 30, 2006. the term "bi- 
allelic matters" in the art generally means exact, true bi-allelic markers. (For example, SNPs are 
examples of such exact, true bi-allelic markers.) The specification, however, also expands the term "bi- 
allelic marker" somewhat and describes bi-allelic marker equivalents or BMEs (mathematical markers 
formed from one or more markers that act like they are bi-allelic) and approximate bi-allelic markers, see 
for example [0054] and [0055]. 

Correction of Attorney misstatements in the record The applicants now correct two misstatements 
in the Remarks of previously filed responses made by the applicants' attorney. These corrections are 
made to avoid any future confusion. (As the Court in Biotec Biologische vs. Biocorp (249 F 3d 1 341 , 58 
USPQ2d 1737) essentially noted, attorney errors in the prosecution record must be viewed in context 
and be considered in light of other statements in the same prosecution record.) 

Misstatement 1: The applicants' attorney has previously made the following misstatement in the 
Remarks Section of previously filed responses (p. 1 1 December 2004 & p. 20 August 2006): "For the 
record, the applicants note that the linkage disequilibrium in the well known principle quoted 
above from [0308] is essentially measured in a specific way: i.e. the increased disequilibrium is 
computed respectively as 675m ax for 8 > 0 or 8/5 mln for 8 < 0, wherein each of the 5 values is a 
value of the coefficient of disequilibrium. This is the way (or essentially the way) that increased 
linkage disequilibrium is computed in the application in paragraphs [0291], [0292], in Table 2 on 
page 21, in AHG 98 in Tablesl, 2, and 3 pp. 165, 167. " This, however, is a misstatement. 
Clarification Paragraph [308] of the application states: "It is well known that increased disequilibrium 
between a marker and linked disease locus increases evidence for linkage provided by association- 
based linkage tests such as the TOT. "The applicants now make the following clarification. It is true 
that such a concept as stated above in [308] was well known in the art. And this concept is consistent 
with the findings in the inventor's paper ( A nnals of Human Genetics. 1998. vol 62. pp. 159-179. referred 
to herein as AHG98). These findings in the inventor's paper (AHG98) include (p. 160): "(2) TDT power is 
increased by disequilibrium between a bi-allelic marker and disease locus". 
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It is also true that in AHG98, disequilibrium, including increased disequilibrium, is measured (or 
essentially measured) in a specific way: i.e. the disequilibrium is computed respectively as 5/5max for 8 

> 0 orS/Smin for 5 < 0, wherein each of the 6 values is a value of the coefficient of disequilibrium. 

However, the statement of the well-known concept in [308] of the application does not necessarily mean 
that the disequilibrium (or increased disequilibrium) in the concept is measured (or essentially 

measured) as 5/5max for 8 > 0 or 8/8min for 8 < 0. Such a statement is not made in the originally filed 
application. Applicants' attorney's statements to this effect in the earlier cited Remarks (p. 1 1 December 
2004 & p. 20 August 2006) were erroneous. Moreover, the findings in AHG98 are not admitted to being 
prior art with respect to the present invention by any Remarks made or by paragraph [308]. 
Misstatement 2:The applicants' attorney has previously made the following misstatement in the 
Remarks Section of previously filed responses (bottom p. 20 August 2006): "The linkage 
disequilibrium (including increased linkage disequilibrium) which essentially one-dimensional 

panels attempt to achieve is measured (or essentially measured) as S/5max when 8 >0 and 

8/Smin when 5 < 0." Again, this statement is erroneous. As stated above the inventor's findings in 
AHG98 are consistent with the well-known concept in [308], but the linkage disequilibrium (including 
increased linkage disequilibrium) which essentially one-dimensional panels attempt to achieve is not 
necessarily measured (or essentially measured) as S/S ma x when 8 > 0 and 8/8 m j n when 8 < 0. And the 
term "an essentially one-dimensional panel of markers for a linkage study" should not necessarily be 
construed to include the measurement limitations above involving S/8 ma x when 8 > 0 and S/8 m in when 

8 < 0. Such a statement is not made in the originally filed application. Rather, "an essentially one- 
dimensional panel of markers for a linkage stud/' is any conventional (at the time of filing of the first 
priority document, US Provisional 60/76, 102) linkage study marker panel chosen to attempt to achieve 
one-dimensional closeness and linkage of one or more panel markers and the (or a) sought trait- 
causing polymorphism. Such essentially one-dimensional panels are described above, including in new 
paragraph [0026.1]. 
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Conclusion 



All points of rejection and objection in the Final Office Action of October 3, 2006 have been answered. 
In this RCE & Amendment/Response the applicants have submitted an amended first paragraph of the 
Specification and have requested the addition of new Specification paragraph, [0026.1]. Some claims 
have been amended as discussed above, 1 claim has been cancelled, 1 new independent claim and 9 
new dependent claims have been added. 

Appropriate small entity fees for a 1 month extension, an RCE, and extra claim fees for 1 new 
independent and 9 new dependent claims are also enclosed. 

For the reasons advanced above, applicants respectfully submit that the application is now in condition 
for allowance and that action is earnestly solicited. 

Respectfully submitted, 



Robert O. McGinnis 
Registration No. 44, 232 
February 5, 2007 
1575 West Kagy Blvd. 
Bozeman, MT. 59715 
tel (406)-522-9355 




Enclosures: Selected pages from the McMahon, the Inventor's paper AHG98, Julier, Spielman, and 
Thomson references, some marked. Total of 15 pages. 
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The identification of genes underlying a 
complex phenotype can be a massive under* 
taking, and may require a much larger 
sample size than thought previously. The in- 
tegration of such large volumes of clinical 
and laboratory data has become a major 
challenge. In this paper we describe a net- 
work-based data management system de- 
signed to address this challenge. Our system 
offers several advantages. Since the system 
uses commercial software, it obviates the 
acquisition, installation, and debugging of 
privately-available software, and is fully; 
compatible with Windows and other com- 
mercial software. The system uses rela- 
tional database architecture, which offers 
exceptional flexibility, facilitates complex 
data queries, and expedites extensive data 
quality control. The system is particularly 
designed to integrate clinical and labora- 
tory data efficiently, producing summary 
reports, pedigrees, and exported files con- 
taining both phenotype and genotype data 
in a virtually unlimited range of formats* We 
describe a comprehensive system that man- 
ages clinical, DNA, cell line, and genotype 
data, but since the system is modular, re- 
searchers can set up only those elements 
which they need immediately, expanding 
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INTRODUCTION 

The identification of genes underlying a complex 
phenotype can be a massive undertaking. Data man- 
agement for such studies must cope with large sample 
sizes, multiple data storage sites, and some data that 
change over time. The integration of such large vol- 
umes of clinical and laboratory data has become a ma- 
jor challenge in genetic studies of complex phenotypes. 

Gene identification in complex phenotypes may re- 
quire a much larger sample size than thought previ- 
ously. Large sample sizes may be needed for the initial 
detection of linkage, and even larger sample sizes for 
replication of linkage findings [Suarez et al., 1995). 
Narrowing the linkage finding to a physically- 
mappable chromosomal location may require more 
than 2,000 affected sib-pairs [Kruglyak and Lander, 
1995]. Furthermore, each affected subject is typically 
associated with a large amount of clinical data and the 
more than 300 genotypes that are generated in ge- 
nome-wide linkage searches. 

Each type of primary data has special storage re- 
quirements. Clinical assessments are typically col- 
lected on handwritten forms and are often supported 
by copies of medical records and other documentary 
evidence in various nonstandard formats. Blood 
samples and cell lines must be tracked from subject to 
freezer. Genotype data may exist in the form of auto- 
radiographs or the specialized data files produced by 
.automated genotyping systems. 

The data generated in genetic studies are not static, 
" but change over time. Previously unavailable relatives 
may volunteer for the study or previously studied sub- 
jects may die. Clinical data may change after longitu- 
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dinal follow-up. DNA sample supplies dwindle and 
must be replaced. Genotype data may require correc- 
tion if they fail to segregate or when false paternity or 
sample mix-ups are detected. Thus, regular archiving 
and updating of data are required to forestall degen- 
^ eration of the database over time. 

In this paper we describe a network-based data man- 
agement system designed to address the special re- 
quirements of family studies of complex phenotypes. 
The system is expandable, modular, and easily adapted 
to a wide variety of studies. The system uses existing 
relational database, pedigree, and networking software 
and standard PC hardware. The efficient integration of 
clinical and laboratory data in the form of output files, 
summary reports, and pedigrees is a major feature. 

MATERIALS AND METHODS ^ 
User Input 

Our first task in designing a data management sys- 
tem focused on the "users," i.e., the clinicians, research 
assistants, and laboratory technicians who would be 
using the system every day. We involved the users in 
deciding which data elements needed to be considered, 
the naming of fields and tables, and the design of data 
entry forms. After each module of the system was made 
available, users were polled in a series of feedback 
meetings about whether that module was accomplish- 
ing the desired tasks in an efficient and user-friendly 
fashion. If not, that module was redesigned to better 
fulfill user needs. After the system was entirely in 
place, further expansions and modifications were con- 
sidered as needed. 

Hardware EequiremesMts . , 

The hardware requirements for this system depend 
on the size of the sample to be studied and the modules 
used. At least one computer is required to house the 
main database, and (ideally) at least one computer is 
devoted to each module, located in a spot where the 
users will have ready and continuous access. As the 
main computer we use a Pentium 75-MHz machine 
(Compaq Computer Corp., Houston, TX) with 16 MB of 
RAM, a 2-gigabyte hard drive, and a Colorado Jumbo 
1400 tape drive (Hewlett Packard, Palo Alto, CA). 
These should be viewed as reasonable minimum re- 
quirements, since this computer stores all the data and 
acts as a server for the entire system. For each module 
we use at least one 486/66-MHz or Pentium 75-MHz 
computer with at least 16 MB RAM and a 500-MB hard 
drive. For the genotype module, a Macintosh computer 
(Apple Computer, Cupertino, CA) is also needed if 
automated genotypes will be processed using the 
GeneScan 2.1fc2 and Genotyper 1.1 programs (Applied 
Biosystems, Foster City, CA). 

If additional computers are being used, then a net- 
work must also be in place and each networked com- 
puter needs a network card. Hie network system we 
use is a departmental local area network (LAN), con- 
necting each computer to a central hub, with struc- 
tured cabling utilizing lOBaseT Ethernet. The hub also 
connects the LAN to the campus-wide network using 
fiber-optic cable, which provides access to the Internet. 



Each computer should be connected to a printer, ei- 
ther through the network or through a printer buffer. 
We use two different types of printers. Cmr main 
printer is a Hewlett Packard (HP) LaserJet 4M Plus 
(Hewlett Packard, Palo Alto, CA), which is linked to the 
network and is centrally located for multiple users. We 
also have HP Inkjet 500-series printers attached di- 
rectly to the computers that serve the DNA, cell line, 
and genotype modules. 

The DNA module is designed to work with a spectro- 
photometer that measures DNA concentration and 
quality. We use an HP Diode Array Spectrophotometer 
(model 8452A) connected to an HP Vectra 486/33N 
computer with 16 MB RAM. Included with the spectro- 
photometer is general scanning and quantitation soft- 
ware that enables the user to program desired absor- 
bance wavelengths for DNA quantitation. 

. . . .: 

This data management system requires two types of 
software: commercial or "off the shelf programs, and 
customized programs written by us expressly for use 
with the database software or for software used by in- 
dividual modules. 

The commercial software required includes a rela- 
tional database program, backup software (usually pro- 
vided with the backup drive), a pedigree drawing pro- 
gram, and optional network software. The relational 
database software is the keystone, providing storage, 
querying, and reporting capabilities. The relational da- 
tabase software must also allow each module to access 
data both within and between modules on different 
computers. One of the most useful features of this da- 
tabase is the ability to output data into a pedigree 
drawing format, so the software chosen for the pedigree 
drawing should be able to import files. We use Paradox 
version 5.0 (Borland Intl., Scotts Valley, CA) for the 
relational database, Colorado Backup version 2.80, Cy- 
rillic 2.1 (Cherwell Scientific, Oxford, UK) for the pedi- 
gree drawing, and Windows for Workgroups 3.11 on 
the server computer. Computers used for the indi- 
vidual modules use either Windows for Workgroups 
3.11 or Windows NT 3.51/4.0. 

The customized programs come in three types. The 
first type facilitates the use of the database software. 
These programs consist of data entry forms, reports, 
and queries that are part of the relational database 
program options; only knowledge of that software is 
needed. The second type of customized program builds 
on the options within the database software (for Para- 
dox these programs are written in ObjectPal). For ex- 
ample, we have created "smart forms" that aid data 
entry by filling some fields with prespecified default 
values, skipping inappropriate fields based on previ- 
ously entered values (e.g., skipping an IF Yes, specify 
field when "no" was entered into the previous field), 
and automatically calculating sums and differences. 
Other forms provide a "button" that when "pressed" 
executes other programs, e.g., backing up the data files 
or archiving old data. These custom programs, which 
modify and extend features within the database soft- 
ware, call for additional knowledge of that software 



Network-based Data Management System 253 



The primary output for the DNA database is a report 
listing the box location, current volume, and total 
amount of DNA per vial and per subject. Other reports 
can be generated that essentially function as flags. For 
example, reports are generated when a particular DNA 
vial is of poor quality (e.g., out-of-range 2 60/280- rim 
ratio) or when the amount of DNA for a particular sub- 
ject goes below a user-specified value, thereby alerting 
the technician to begin cell culture for the extraction of 
new DNA. The main output file can also be interfaced 
with the cell line, clinical, and genotype databases. 

Cell Line Module 

Structure* The cell line module consists of three 
related tables (Fig. 4). The Growth table contains data 
about each growth attempt, recording the number of 
attempts, quality of the growth, and reasons for any 
failure. The Storage table contains the box and coordi- 
nate location data for each cell line vial in the freezer, 
giving an up-to-date inventory of cell lines available for 
each subject studied. The Usage History table records 
any additions or removals to the freezer, thus tracking 
the usage of cell lines, and facilitating error checks and 
audits. 

Data flow. When a blood sample arrives at the 
laboratory, an attempt is made to grow a cell line. The 
vigor and quality of the culture and relevant dates are 
recorded in the Growth table for every growth attempt. 
Once a cell line is grown successfully, the storage in- 
formation is entered into the Usage History table, with 
a field showing that it is a new addition. These data are 
automatically copied into the Storage tables. When a 



cell line is removed from the freezer, this fact is also 
entered into the Usage History table, with tite field 
showing it as a new removal. The vial is then automati- 
cally deleted from the Storage table. As a result, the 
Storage table always contains an accurate inventory of 
the available cell lines and their locations. The Usage 
History table contains a record of all additions and re- 
movals, and thus acts as an archive for the Storage 
table. 

Reports are generated to identify subjects needing to 
have a blood sample redrawn because the culture 
failed, to summarize the number and locations of cell 
line vials stored for each individual, and to alert labo- 
ratory staff that the supply of cell line vials for an in- 
dividual has gone below a user-specified value. 

Genotype Module 

Structure. The genotype module consists of four 
related tables (Fig. 5). The Genotype table contains the 
marker genotypes for each subject, in the form of arbi- 
trary allele numbers. The exact allele size in base pairs 
corresponding to each arbitrary allele number is re- 
corded in the Allele Size table. The Reader table con- 
tains the information on who read the genotypes in 
each family, with Marker ID specified. Allele Size and 
Reader are linked to other tables via Family ID, since 
arbitrary allele numbers are assigned within each fam- 
ily. The last table, Markers, contains the reference in- 
formation for all markers, linking the Marker ID to the 
marker name(8), chromosome, and location (if desired, 
multiple Markers tables can be used to group markers 
in a convenient manner, such as by chromosome). 



Master Pedigree 




Rg. 4. The cell lino data module trcclio the growth and storage cf ccch vial of lympbcblnatcid eeilfl generated for each culccct Sec Figure 2 legend 
for explanation of oymbolc. 
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Pig. 5. Itie genotype data module tracks the results of the polymorphic DNA marker analyses for each Bubject as well as descriptive data about the 
markers used and their chromosomal locations. See Figure 2 legend for explanation of symbols. 



Data flow. After a family has been clinically 
evaluated, the individuals with a DNA sample who are 
required for linkage analysis are selected for genotyp- 
ing. This is noted in the Master Pedigree table, in a field 
called Genotyping. Another field, Need for Pedigree, 
designates the individuals selected for genotyping 
along with individuals required to connect the pedigree 
structure, e.g., parents of a sib-pair. 

Once the genotypes for each marker are determined, 
they are entered into the Genotype table. If the geno- 
types were read automatically, these data are set in an 
importable format by a semiautomated routine. Output 
from the ABI 373 sequencer is binned using the pro- 
gram Genetic Analysis System (GAS 2.0; GAS © Alan 
Young, Oxford University, 1993-1995), whose text out- 
put is imported into the database tables. If genotypes 
are read manually, the information is entered directly 
into the Genotype table. This is accomplished with a 
form that requires entering Marker ID only once per 
family and simplifies data entry by allowing entry into 
Genotype, Reader, and Allele Size tables all at once. 
This guarantees complete data for every genotype, e.g., 
that every arbitrary allele number corresponds to an 
absolute allele size value in the database. 



The primary output from the genotype module is 
linked with the phenotype data from the clinical mod- 
ule to generate linkage files for analysis. Other outputs 
can also be generated, e.g., status reports summarizing 
genotype progress by individual or marker. 

Pedigree Reports 

The most useful report format for family studies is 
often the pedigree itself. Therefore, we developed a 
method for importing any data of interest from the da- 
tabase into a pedigree drawing program, where the 
data are displayed directly on the pedigree. This ap- 
proach preserves the flexibility of the relational data- 
base while displaying the data in the way that is most 
intuitive for genetic researchers. 

For each report format, a program collects the data of 
interest from the relevant tables in the database, joins 
these data with the pedigree structure information in 
the Master Pedigree table, and formats the joined data 
for importing into the pedigree drawing program. Re- 
sidual errors in pedigree structure are easily detected 
at this point, since any errors will cause the pedigree 
drawing program to either reject the import file or 
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SUMMARY 

I compare the transmission/disequilibrium test (TDT) and affected sib pair (ASP) test under a . 
general algebraic model describing a bi-allelic disease locus. Assuming linkage to a bi-allelic marker, 
I derive two binomial probabilities, one for parental allele 'transmission* (P t ) which determines the 
magnitude of the TDT x 2 statistic (xtat)> anc * a second for identity-by-descent (ibd) marker allele . 
' sharing' (jP s ) which determines the magnitude of the ASP test statistic (^ p ). I also consider the ASP 
test applied to a completely polymorphic marker and demonstrate that the probability of ASP 
marker allele sharing (P s ) is identical to P 8 observed for a bi-allelic marker in equilibrium with the 
disease locus. I present a general framework for determining the power of the TDT and ASP test 
based on expressions for P v P 8 and the proportion (H/F) of ascertained parents who are informative 
at the marker. Two previous analytic investigations of TDT power based on the work of Ott (1989), 
and Risch & Merikangas (1996) are shown to be special cases of this general framework. In addition, 
I show the relationship between the framework I present and a third analytic investigation of TDT 
power for multi-allelic markers based on the work of Sham & Curtis (1995). 

INTRODUCTION 

Linkage has been demonstrated between insulin -dependent diabetes mellitus (IDDM) and the 
insulin gene region on chromosome llpl5.5 on the basis of linkage analysis by the transmission/ 
disequilibrium test or TDT (McGinnis el al. 1991 ; Spielman el at. 1993). Linkage was demonstrated 
at the insulin 5'VNTR, a hypervariable marker that is extremely polymorphic, but whose VNTR 
alleles fall into two main size classes in Caucasians, thus forming a natural bi-allelic ( + / — ) marker. 
The 4- alleles were discovered to be positively associated with IDDM in case-control studies (Bell 
et at. 1984). Subsequent studies then demonstrated linkage in families collected for Genetic Analysis 
Workshop 5 (GAW5) by TDT analysis of GAW5 parents who were heterozygous ( + / — ) under the 
^5'VNTR bi-allelic categories (Spielman el al. t993; see also Thomson et al. 1989, Julier et al. 1991). 
The very strong evidence for linkage provided by the TDT (# fi = 8.26, p < 0.005) was both 
surprising and puzzling because identity-by-descent (ibd) sharing of 5'VNTR alleles in affected sib 
pairs (ASPs) yielded no evidence for linkage in the same GAW5 families. Indeed, evidence for 
linkage was completely undetected or 'hidden' because the proportion of alleles shared by ASPs 
did not exceed the null hypothesis value of 0:5 in two different types of ASP analysis. On one 
hand, there was no increase in ASP allele sharing when the analysis included ail GAW5 families in 
which both parents were informative for any two lengths of 5'VNTR allele (Spielman el al. 1989; 
Cox & Spielman, 1989). On the other hand, when the analysis included only those ASP parents 
who were evaluated by the TDT, namely those heterozygous ( + / — ) when the 5'VNTR is con- 
Address for correspondence: Dr. Ralph McGinnis, Senior Investigator, SmithKline Beecham, New Frontiers 
Seience Park (North), Third Avenue, Harlow, Essex CM19 SAW. 
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calculated for a marker that is distinct from the disease locus. Analysis of the equations shows 
that TDT power is greatly increased if disequilibrium is strong and if the disease allele and posi- 
tively associated marker allele have similar population frequencies. The equations also show that 
the superior power of the TDT compared to the ASP test is greatest when susceptibility loci confer 
modest disease risk, as indicated by low values of the penetrance ratio r. When a marker is strongly 
associated with a disease locus that contributes modest disease risk, \P t — 0.5| ^> (P s — 0.5) ^ 0. Thus, 
the TDT is likely to play an important role in detecting and replicating linkages to loci responsible 
for complex genetic disease. 

I am deeply grateful to Richard Spielman for encouragement and valuable suggestions as this work developed. I 
am also indebted to Warren Ewens for valuable comments and for criticism that improved the manuscript. This 
research was supported by NIH grants DK46618 and DK47481 and by grant 193189 from the Juvenile Diabetes 
Foundation. 
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APPENDIX I 

Derivation of expressions for \P S> P v H 

The derivations assume the general model of a bi-allelic marker and linked bi-allelic disease locus 
that is the only locus that underlies disease susceptibility (see General algebraic model of linkage 
in the main text). I begin the derivation of P s and P t (equations (1) and (2)) by first deriving 
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Summary 

A population association has consistently been observed between insulin-dependent diabetes mellitus (IDDM) 
and the "class 1" alleles of the region of tandem-repeat DNA (5* flanking polymorphism [5TP]) adjacent to the 
insulin gene on chromosome lip. This finding suggests that the insulin gene region contains a gene or genes 
contributing to IDDM susceptibility. However, several studies that have sought to show linkage with IDDM by 
testing for cosegregation in affected sib pairs have failed to find evidence for linkage. As means for identifying 
genes for complex diseases, both the association and the affected-sib-pairs approaches have limitations. It is 
well known that population association between a disease and a genetic marker can arise as an artifact of 
population structure, even in the absence of linkage. On the other hand, linkage studies with modest numbers 
of affected sib pairs may fail to detect linkage, especially if there is linkage heterogeneity. We consider an 
alternative method to test for linkage with a genetic marker when population association has been found. Using 
data from families with at least one affected child, we evaluate the transmission of the associated marker allele 
from a heterozygous parent to an affected offspring. This approach has been used by several investigators, but 
the statistical properties of the method as a test for linkage have not been investigated. In the present paper we 
describe the statistical basis for this "transmission test for linkage disequilibrium" (transmission/disequilibrium 
test [TDT]). We then show the relationship of this test to tests of cosegregation that are based on the proportion 
of haplotypes or genes identical by descent in affected sibs. The TDT provides strong evidence for linkage 
between the 5TP and susceptibility to IDDM. The conclusions from this analysis apply in general to the study 
of disease associations, where genetic markers are usually closely linked to candidate genes. When a disease is 
found to be associated with such a marker, the TDT may detect linkage even when haplotype-sharing tests 
do not. 



Introduction 

A crucial first step in finding gene loci that contribute 
to a genetic disease is to demonstrate linkage with a 
gene or DNA sequence of known location (a "marker, 1 ' 
usually a DNA polymorphism). A number of investiga- 
tors have used this approach in the study of diabetes 
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mellitus. Bell et al. (1984) described a population associ- 
ation between insulin-dependent diabetes mellitus 
(IDDM) and the 5' flanking polymorphism (5TP), an 
R.FLP adjacent to the insulin gene on chromosome lip. 
Although it is not clear that insulin or the insulin gene 
itself plays a role in the pathogenesis of IDDM, the 
association has been found consistently in population 
studies (for a summary, see Cox et al. 1988). In unaf- 
fected controls, the frequency of the smaller, or "class 
1," alleles is approximately .70-.75, while in IDDM 
patients the frequency is somewhat higher .80-.85. 
This finding provides indirect evidence for linkage be- 
rween the insulin gene region and genes that influence 
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susceptibility to IDDM, since an association between 
disease and marker may be due to disequilibrium be- 
tween linked loci. However, the problem with inferring 
linkage from population association is that association 
can occur in the absence of linkage — for example, as a 
result of population stratification. Thus it is not valid to 
use the presence of association as a test for linkage if 
population stratification is a possibility. 

For this reason, tests of linkage that do not depend 
on association were carried out by various investiga- 
tors. In most of these studies, there was no direct evi- 
dence for linkage (Hitman et al. 1985; Ferns et al. 
1986). In larger samples, the distribution of 5'FP alleles 
in 33 affected sib pairs (ASPs) with IDDM (Cox et al. 
1988) or in the 95 ASPs studied in Genetic Analysis 
Workshop 5 (GAW5) (Cox and Spielman 1989; Spiel- 
man et al. 1989) failed entirely to provide evidence for 
linkage. Thus the absence of cosegregation within fami- 
lies suggested that the association was due to popula- 
tion stratification rather than to disequilibrium with a 
linked locus. 

However, other approaches have suggested that the 
r* association is not due solely to stratification. Using the - " 
\ method of Field et al. (1986), Thomson et al. (1989) 
analyzed the GAW5 family data by the following 
method. In each family, the four parental 5'FP alleles 
were assigned to one of two categories: (1) transmitted 
to at least one diabetic offspring ("diseased") and (2) 
not transmitted to any affected offspring ("control"). 
{ This method has been termed "AFBAC," for "affected 
L family-based controls" (Thomson 1988). As tested by a ^ 
conventional x\ the frequency of 5'FP class 1 alleles in 
the diseased category (.83) was significantly higher than 
that in the controls (.69) (p < .01). Since the control and 
disease samples are obtained from the same individuals, 
the contribution of stratification to the association is 
reduced or eliminated. However, the comparison does 
not provide a direct test for linkage. 

In the present paper we describe a procedure which 
tests directly for linkage between a disease and marker 
locus which shows population association; this test is 
not affected by the presence of stratification. The data 
for the test are from families with one or more affected 
offspring and at least one parent who is heterozygous 
for a marker allele (e.g. , 5'FP class 1 ) associated with the 
disease. The test procedure compares (a) the number of 
times that such heterozygous parents transmit the asso- 
ciated marker ro an affected offspring with (b) the num- 
ber of times that they transmit the alternate marker 
allele. Because of this focus on alleles transmitted to 



affected offspring, the test shares some features with 
the concept of haplotype relative risk (HRR; Falk and 
Rubinstein 1987) and with the AFBAC test of associa- 
tion (Field etal. 1986; Thomson etal. 1989; Field 1991) 
described above. However, because our emphasis is on 
testing for linkage, the actual tests are different. Since 
GAW5 (Spielman et al. 1989), the principle underlying 
this linkage test has been used explicitly (McGinnis et 
al. 1991) or implicitly (Owerbach et al. 1990; Julier et al. 
1991) in other investigations, to provide additional evi- 
dence that determinants of IDDM are located in the 
insulin gene region. 

In GAW5, Ott presented the formal theory which is 
necessary for any test of a hypothesis based on a com- 
parison of frequencies of marker alleles transmitted or 
not transmitted to affected offspring. His analysis 
showed that the probabilities of the various possible 
combinations of transmitted and nontransmitted 
marker locus alleles are determined by the association 
(disequilibrium) parameter 8 and the recombination 
fraction 0 between the loci. However, we show below 
that the % 2 procedure used as a test of association (i.e., 
AFBAC) is not, in general, valid as a test of linkage, and 
we derive a procedure which is valid. We also show (1) 
that our testing procedure also provides a test for asso- 
ciation between the two loci (indeed, the test can de- 
tect linkage only if association exists); (2) the relation- 
ship of this test to tests based on sharing of haplotypes 
or genes (identical by descent) in ASPs, affected sib 
trios, etc.; and (3) the result of applying this test to data 
on the 5'FP in IDDM. 

The Transmission Test for Linkage 
Disequilibrium 

The transmission/disequilibrium test (TDT) con- 
siders parents who are heterozygous for an allele asso- 
ciated with disease and evaluates the frequency with 
which that allele or its alternate is transmitted to af- 
fected offspring. Compared with conventional tests for 
linkage, the TDT has the advantage that it does not 
require data either on multiple affected family members 
or on unaffected sibs. However, the TDT has the dis- 
advantage that it can detect linkage between the marker 
locus and the disease locus only if association (due to 
linkage disequilibrium) is present. 

In the following sections we describe the properties 
of the TDT as a test of significance for linkage. We then 
discuss the relationship of the TDT to tests of linkage 
that are based on shared haplotypes in ASPs. 

We assume a disease locus D, with disease allele D, 
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and haplotype-sharing x 2 's can be used, separately or 
together, to test for linkage. These considerations also 
generalize to sibships with four or more affected. 

We have shown above how a x 2 statistic to test trans- 
mission/disequilibrium can be calculated for data in 
which all families have the same number of affected 
children. In any real set of data, we can expect to ob- 
serve families with varying numbers of affected chil- 
dren. In such a case we recommend simply combining 
all affected children in the data, irrespective of number 
of affected in the family, in one overall transmission/ 
disequilibrium % Z statistic of the form (B-C) 2 /(I3+C), 
where B is the total number of transmissions of M, to 
affected children and C is the total number of transmis- 
sions of M 2 . In the case where segregation distortion at 
the M locus is a possibility, an aggregate 2 X 2-table X 2 
is appropriate, corresponding to that discussed above 
for the case of one affected child per family. We use 
such a x 2 procedure below in the Results subsection. 



Dato 

The data for this study were assembled for GAW5 
from 94 families with two or more IDDM children 
(Baur et al. 1989; Spielman et al. 1989). For GAW5, 
Southern blots of genomic DNA digested with PvuW 
were hybridized with phins 310 (Bell et al. 1984), a 
probe specific for the 5'FP, and alleles were assigned by 
eye to one of three classes corresponding to fragment 
size. (Class 1 is smallest, and class 3 is largest.) Gel posi- 
tions of genomic bands and markers were also re- 
I corded; for the present reanalysis we assigned rest tic- 
\ tion fragments to allele class 1 if they were smaller than 
1 1 kb, to class 2 if they were 1 -2 kb, and to class 3 if they 
j were larger than 2 kb. Since our analysis focuses on the 
J role of class 1 alleles, class 2 and class 3 alleles were 
1 grouped together as class X. Among the 94 families, 
| there were 53 in which at least one parent was heterozy- 
gous for class i and class X alleles. 

Results 

In order to demonstrate the usefulness of the TDT, 
we review the findings with respect to population asso- 
ciation and haplotype sharing. Th e family data ob- 
t ained for GAW5 do not lend themselves to a conven- 
tional association study, which would include 
unrelated controls. Howeve r, when just unrelated dia- 
betics (the oldest affected sib in each family) are consid-^ 
ered , theHf rcqu ency^ o rclass^ LaUeles ^n r he_ p resen t 
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TDT for Alleles I and X of 5'FP in IDDM: Data for l/X 



Parents of All Affected Children 




No. of Alleles 
Transmitted 

1 X Total 




Significance 
IP) 




78 46 124 
62 62 


8.26 


.004 



(GAW5) data is 138/162 = .85. This value is similar to 
those reporteJ for "ranBom'^diabetics and is higher 
than that found in unrelated controls, as has been ob- 
served elsewhere (Cox et al. 1988). 

An analysis of haplotype sharing in the GAW5 family 
data was previously carried out by Cox and Spielma n 
(I98bj. Using the X test statistic of equation (12) ("Y" j 
of Blackwelder and Elston [1985], applied strictly to j 
ASPs), Cox and Spielman (1989) did not find even mod- j 
est departures from random sharing. This result also \ 
held when they considered only families with at least 
one parent heterozygous for class t /class 3 at the 
(For the corresponding test by equation [7] or t z of 
Blackwelder and Elston [1985], see table 7 below.) 
Thus there is population association but no evidence 
for linkage, by conventional tests. — J 

However, when linkage is tested by the TDT, a dif- ; 
ferent conclusion emerges (table 5). There were 57 par- 
ents heterozygous for alleles 1 and X of the 5'FP; these 
parents transmitted 124 alleles (78 class 1 alleles and 46 
class X alleles) to their diabetic offspring. Under the 
hypothesis of no linkage, the expected number of 
transmissions of 1 and X is equal (i.e., 62). When equa- 
tion (5) is used, the difference observed is highly signifi- 
cant; = (78-46) 2 /124 = 8.26, p = .004. 

As explained above > the difference found with the 
TDT could be due to an "artifact" of meiotic segrega- 
tion distortion, which would be expected to apply to 
both affected and unaffected offspring, if unrelated to 
disease. For this reason, we compared affected and un- 
affected offspring with respect to transmitted class 1 
and class X alleles. The results are shown in table 6. 
Among affected offspring, 78 (63%) of 124 alleles 
received from heterozygous parents were class I. 
The corresponding figure for unaffected offspring was 
42 (40%) of 104; the difference is highly significant 
(xf = 11 .5, p < .001 ). This result confirms the finding of 
linkage; there is no evidence for segregation distortion. 
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Table 6 

Comparison of Alleles I and X of 5'FP Transmitted 
to IDDM-affected Offspring and Unaffected Sibs 





No, of Alleles 








Transmitted 














Significance 




l X 


Total 




(P) 


Affected 


78 46 


124 


11.5 


<.001 




42 62 


104 







NOTE.— Data for 1/X parents. 



The strong evidence for linkage, based on the TDT, 
stands in striking contrast to conclusions obtained, in 
earlier studies, from the Y statistic for haplotype shar- 
ing. However, the TDT (above) and the Y statistic were 
based on overlapping but not identical sets of families. 
This discrepancy arose because families with only one 
parent heterozygous 1/X could be used for the TDT 
but not for the Y statistic. Furthermore, parents with 
two distinguishable class 1 alleles were used for Y but 
not for the TDT. 

These differences in the data led us to ask the follow- 
ing question: Is the failure to find linkage with the Y 
statistic due entirely to the difference between the sam- 
ples used, or are linkage tests based on haplotype shar- 
ing inherently less sensitive than the TDT for the pres- 
ent data? To answer this question, we applied the 
transmission/disequilibrium (%J d ) and haplotype-shar- 
ing (xL) tests to exactly the same data. Not all the data 
from table 5 can be used, because some are from sim- 
plex families or from sibships with more than two af- 
fected sibs. Accordingly, we used just those families ' 
with at least one 1/X parent and exactly two affected 
sibs, as appropriate for equations (4)-(8). Table 7 shows 
the data in the form of definitions (4). 

For the TDT we compare i with /, by equation (6), 
and obtain x? d = 3.60 (p = .058). Unlike the corre- 
sponding test above (x? d ~ 8.26), the present compari- 
son is not "quite" significant. Although the proportion 
of class 1 alleles transmitted here (54/90 = .60) is al- 
most the same as that in table 5 (78/124 = .63), the 
is smaller, and the significance level is less striking, be- 
cause of the smaller sample size. 

For the haplotype-sharing test, we compare (a) the 
number of parents (*+/" = 21) who transmitted the same 
allele (1 or X) to both affected children with {b) the 
number (h-i-j = 24) who transmitted different alleles. 



This is equivalent to comparing the number of ASPs 
who received the same allele ("shared") with the num- 
ber who received different alleles ("unshared"). The re- i 
suiting (0.20) is not significant, and the difference is 
in the opposite direction of that predicted by linkage, 
presumably reflecting random variation. There is not 
even a "trend" toward increased sharing. 

Thus, in the present analysis of a single body of data, 
we see the discrepancy identified in earlier reports. 
There is a population association between 1DDM and 
the class 1 allele of the 5TP, but sharing of alleles by 
affected sibs (cosegregation) provides no evidence for 
linkage. Nevertheless, there is highly significant evi- 
dence of linkage in the TDT. 

Discussion 

Linkage studies for so-called complex genetic dis- 
eases pose problems not found in standard linkage anal- 
ysis. Because these diseases, in general, have reduced 
penetrance, unaffected family members usually provide 
much less information for linkage than do affected 
members. In this situation, it is essential to study fami- 
lies with multiple affected members and to focus on 
affected relatives, such as ASPs. The ASP approach has 
been applied with great success to unravel the role of 
the HLA complex in several diseases to which HLA 
appears to make a large contribution. For a locus that 
makes a modest contribution, however, the approach is 
severely limited. It has been shown by computer simula- 
tion (Cox and Spielman 1989) that, when ASPs are 
used, the power to detect linkage to such a locus is very 
modest and may require hundreds of ASPs. Further- 
more, an additional consequence of the low penetrance 

Table 7 

Transmission from 45 1/X Parents of IDDM-affected 
Sib Pairs 

No. of 1/X Parents Who Transmit 

Class 1 to 
One Child 

Class 1 to Both and Class X Class X to 

Children to the Other Both Children Total 



/ = 15 h-i-j = 24 / = 6 = 45 

Note.— Data are for comparison of x£j (TDT) and xL (haplotype 
sharing). 
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Table I . HUA disease, associations . Data. : adapted from Tmari and.. 
Tfer^afci pbfjQR. lilies are rconvbincd . csuota^es from n number oi 
■«{u<g«»/'^- : C9twtsf : hc fjireialy #k : ulmefj from Tahte); and fDDM OR 
data 'front Thomson ei ril (751 



• Patois 
Disease Ra<re° V&fM5Uiv"cj 


Controls t^fcj 


Qctds Ratio 


Artky)6$tQ^ SpohdiyUus (AS) * . 






B2r : c 


89 


ti 

y. 


fieri i- 


£2X 0 


U 


IJ 




B>1 N 


m„ 




54.4- 








A3 ■ ■ C . ' 








B7 C ■ = 


m 


26 


2.9 


JBJ4 C, 


M 




2.7 • 


m ? ' c ' 


40 . 


IDDMl 

-'• ' Z\ '• 


2 5' " 


.#15 c 


22. ■ 


-14 


2.1 


-DR3 C: ; '" 


52 






DR4 C 


.74 


24 




mz : c • 


.4;- 


29 


' 0.1 


ktojuinatoul Arth ritis {R A3 








DR4 C 




25 




£e\U& Pttose (CD) 








BS C 


6$ 


22 . 


7,6. 


dta - C 




22 . 


11,6 


DR7 C 


:<£ 


. f 15. 
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B7 C 


37 


24..' 




DR2 C 


51 


". ' 27 - 


• * 2.7 
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100 






DR2 C 


*.* 


129.S 


DR2 0 


m 


?4 
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antigen. With more recent typings of the class II loci , initially HLA-DR and 
-D ; and now H LA -DP and - PQ,. a number, of other striking associations vhavc 
been found. For example, 93^ of patients with 'J DOM have DR3 or DR4 
compared to 41?^ of controls, .and 79% of^Caueastan patients .with ccliae 
disease have. DR3 compared to 22% of. controls {76). For many diseases; the 
DR (or other class if) antigens seem to be more strongly associated 
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The affected sib method is often more powerful in detecting the presence of 
disease predisposing genes than are standard association studies. Association 
. studies require the existence of linkage disequilibrium between the alleles, of 
the marker and disease predisposing loci. Significant linkage disequilibrium 
values are not usually expected for loci with recombination distances greater 
than approximately 2% 122). & contrast; deviations from random segregation 
of haploiype sharing values in affected site can be detected over much larger 
recombination distances between the marker loci and the disease predisposing 
loci (46). These do- not require linkage disequilibrium for . the detection of 
disease predisposing genes, -ilii'-.iiiaiity ~iti^^£^''tl)&.,afiteted-'sib pair method 
has been a powerful ;staristieaLtest for linkage; for example, the existence of 
HLA linked disease susceptibility "to IDDM-was, statistically demonstrated 
using 1 5 affected sib. pairs (9). However; in other instances for example, R A 
(45), "demonstration of. deviations of haplotype sharing from random ex- 
pectations has ^required large sample sizes. Or the deviation may not have 
been demonstrated, even when a population association with the disease 
exists, for example, the association of the polymorphic-region 5 1 to the insulin 
gene (5'FP) with 1DDM (7. 74). implying either a high frequency of the 
disease predisposing allele (72) or -the occurrence of sporadic cases of the 
disease. The affected sib. pair method has also been generalized to consider 
the deviations of identity by state values, rather than identity, by descent 
values from random expectations- (31); this makes the -met hod generally 
applicable for any polymorphic genetic region, since all four parental chromo- 
somes then do not have to be distinguishable. 

A new approach.- to a^sod at Ion studies avoids the use of a separate control 
population with its^inherent: problems of ascertainment bias and- possible 
ethnic mismatching. This is to consider family studies where it is assumed the 
proband is an affected child, imd full genetic information is available on both 
parents and all children (.13. 14. 741. We term this method AFBAG (Affeeted. 
Fumily tf/tsed Controls); Within a. family, each allele of the four from the 
parents is defined as belong] rig to the diseased category if it new appears in 
an affected individual. In this case, the alleles designated to the "nondiseased*' 
category provide an appropriate control population, vvith which to compare the 
"'diseased*' population far association studies. Application of this .meihodjo 
the GAW5JDDN4 data sgj^Tjlc^ of the 

i nsulin g ene Tsce2)T ~~ . 

It became clear early in the development of mathematical methods that 
single locus dominant or recessive models; with incomplete penetrance, were 
not sufficient to explain die inheritance patterns of the HLA associated 
diseases. Disease heterogeneity in the HLA region was implicated with 
demonstrations of synergistic effects for some diseases, notably DR3/DR4 
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tpemory system 21 * 22 , the unique neurons described here could 
$erve as memory storage elements, also activated in the retrieval 
process: ... □ : 
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Insulin-IGF2 region on 
chromosome 1 lp encodes a gene 
implicated in HLA-DR4-dependent 
diabetes susceptibility 
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A class of alleles at the V7VTR (variable number of tandem 
repeat) locos in the 5' region of the insulin gene (JJVS) on chromo- 
some Up is associated with increased risk of insulin-dependent 
diabetes mellitns (IDDM) 1 ^ 6 , bat family studies have failed to 

demonstrate linkage 5 ' 7 . //VS. is thought to contribute to IDDIVl 
susceptibility but this view has been difficult to reconcile with the 
lack of linkage evidence 6 " 8 . We thus investigated polymorphisms 
of INS and neighbouring loci in random diabetics, 1DDM multi- 
plex families and controls. HLA-pR4-positive diabetics showed 
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an increased risk associated with common variants at polymorphic 
sites in a 19-kilobase segment spanned by the 5' INS VNTR aad 
the third iotroa of the gene for insulin-like growth factor II ( 1GF2). 
As INS is the major candidate gene from this region, diabetic aad 
control sequences were compared to identify all INS polymorph- 
isms that could contribute to disease susceptibility. In multiplex 
families the IDDM-assodated alleles were transmitted preferen- 
tially to HLA-DR4-positive diabetic offspring from heterozygous 
parents. The effect was strongest in paternal meioses, suggesting 
■ possible role for maternal imprinting. Our results strongly 
support the existence of a gene or genes affecting HLA-DR4 
IDDM susceptibility which is located in a 19- kilo base region of 
INS-IGF2. Our results also suggest new ways to map susceptibility 
loci in other common diseases. 



ins 
I 



0INSSOO OJH* DJK3JI0 



(ll(CA)B 




FIG. 1 Map of polymorphisms of hVS and surrounding loci (C, characterized 
in isolated diabetics and controls; ■. characterized in multiplex families), 
and primers used for sequencing. The placement of polymorphisms Is derived 
from Cox ef at. 24 , or Ben and Sefrto 25 , or deduced from Southern blot 
hybridization of single and double digestion of ONA from homozygous 
individuals. INS coding sequences are represented in black. 5' and 3' untrans- 
lated regions by hatched lines, and introns A and 8 as open boxes. Primers 
used for PCR amplification are shown above (direct primers) and under 
(reverse primers) the maps Candidate WS polyrnorphisms were determined 
by comparison of independent sequences from normal individuals that had 
been deposited in GENBANK 8 " 13 , and by experiments in this study. Restriction 
site rxtymorphisms are designated by their position with respect to the 
first base of the initiating ATG (designated by +1), followed by the restriction 
enzyme; other r^morphisms are designated by the position followed by 
the base-pair change. Generally, for the 5' VNTR we follow the nomenclature 
of Bell et a/. 14 for different allele size classes (1 represents 570-base pair 
(bp) mean; 2. 1320-bp mean and 3, 2,470-bp mean). Alleles showing a 
smaller amount of size variation can be distinguished in each size dass, 
but dass assignment is unambiguous because of the large differences in 
the sizes of alleles in different dasses. Presence or absence of a restriction 
site in haplo types has been designated by the use of 'p' or 'a': other 
polymorphisms are designated by the base-pair variant. Comparison of 
sequences in GBffiANK led to the identification of several potential poly- 
morphisms including four ( -23//*M,805/Dralll. 14.27/Psfl and 1,140 (AO) 
that were identified as base-pair changes between the two insulin alleles 
of one Individual 11 . New sequence data were obtained: the upstream untrans- 
lated region between the INS 5' VNTR and the initiating ATG (nudeotides 
-552 to -19) were sequenced in both haplotypes of six diabetics and eight 
controls; a single patient and a single control were selected for sequencing, 
on both haplotypes, of the remainder of (NS (1.621 bp). After preliminary 
analysis, we selected a diabetic who possessed two different haplotypes 
associated with elevated risk of disease for further sequencing in the latter 
experiment. The two haplotypes contained the alleles IpaaC and lapaA 
(order(VT^)(-23/riDhl)(805/f)rain)(1^27/Psfl) f (1,140{A, 0). The control 
was homozygote for the haplotype associated with the lowest risk for 
disease, 3appA. These experiments confirmed the existence of the four 
potential polymorphisms identified above, and identified four new polymorph- 
isms: at -324 (C insertion) in the 5' region, and 1355 (C. T), 1.404/F/W/4HI, 
1,428/FoM in the 3' flanking sequences. The latter three differences were 
also found between published sequences. Although other differences 
between published sequences were found, these were not confirmed by 
sequence data or PCR experiments done on a set of 10 unrelated indhriduals 
possessing a variety of haplotypes; therefore we conclude that they are 
lakely to be due to sequendng errors. 
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TABLE 3 ©0 and segregation of alleles at tNS locus 



(a) Paternal and maternal IBD for diabetic offspring compared with probands in multiplex families 





Alleles from father 


Alleles from mother 




Combined 


Variant 


1 


0 


1 


0 


1 


0 
















Hapiotype 


68 


54 


50 


72 


118 


126 


5' VNTR 


31 


13t 


17 


15 


48 


28* 


805/Dralll 


30 


13t 


12 


14 


42 


.27 


l,127/Pst\ 


29 


lot 


12 


10 


41 


20t 


l,428/ft*l 


30 


lot 


12 


11 


42 


21t 



(o) Segregation of paternal and maternal insulin alleles in relation to the KA type of diabetic offspring in data from this study and GAW 

' HLA genotypes and INS allele transmitted to diabetic offspring from informative (+/-) meioses* 



Variant 


Parental origin 


DRX/X 




DR3/X, 


DR3/3 


DR4/X, 


0R4/4 


DR3/4 




All DR4 


study 


of INS allele 


+ ' 




+ 




+ 




4- 




+ 




1,127/Psft 
























This study 


father 


2 


3 . 


7 


. 11 


22 


8* 


20 


7* 


42 


15$ 




mother 


0 


0 


2 


7 


7 


6 


15 


14 


22 


20 




total 


2 


3 


9 


19 


29 


14' 


35 


21 


. 64 


35t 


5' VNTR 
























This study 


father 


2 


3 


8 


13 


22 


9* 


24 


9* 


46 


18$ 




mother 


0 


0 


6 


8 


11 


9 


20 


17 


31 


26 




total 


2 


3 


14 


21. 


33 


18* 


44 


26* 


76 


44* 


GAW 


father 


0 


0 


8 


6 


15 


4* 


17 


7* 


32 


11$ 


mother 


1 


2 


3 


3 


10 . 


4 


13 


10 


23 


14 




total 


1 


2 


11 


9 


25 


8t . 


30 


17 


55 


25$ 


Combined^ . 


father 


2 


3 


15 


17- 


36 


12t 


37 


15t 


73 


• 27|j 




mother 


1 


2 


9 


11 


19 


12 


29 


25 


48 


37 




total 


3 


5 


24 


28 


55 


24t 


66 


40t 


121 . 


64§ 



1BD statistics were obtained by counting the number of haplotypes shared by diabetic probands and their affected sibs. Overall, 29 patients shared two 
. haplotypes identical to those of the proband, 60 shared one. and 33 shared none. Expectations for these classes without linkage are 30.5. 61 and 30.5. 
respectively. In a where results are presented for meioses that were Informative for each of the INS loci. P-values for the test of linkage were calculated 
from two-sided chi-SQuared tests. The tests are not independent because of linkage disequilibrium. Alleles of the 5' VNTR locus have been combined into 
size classes as described in Table 1. 0, The alleles that have been transmitted to the affected offspring in informative meioses. after children were classified 
by their HA type. Families from the GAW study that had recombination between 0V5 region markers were deemed to contain genotype errors and were 
removed before analysis. The two studies included eight informative families in common which were removed from the GAW data before their combination. 
Contingency table analysis of the 5' VNTR data from both studies showed significant heterogeneity of the transmission frequencies by HA genotype of 
the affected offspring {P< 0.025) and by parental sex (P<0.04). P« values were calculated from two-sided chl-squared tests for deviations from random 
transmission of the INS allele In each HLA-DR group. * P<0.05; t P< 0.01: $ P< 0.001; § P<10~ 4 : |j P<10~ s . 

(I After correction for families common to both studies. 
■ ■# Plus, transmission of IDDM-associated allele; minus, transmission of another allele. 



carried two 5' VNTR class 1 alleles but were homozygous at 
other INS sites. As before, we found no evidence of linkage 
when these meioses were included in the counts of hapiotype 
sharing (Table 3a j. But in those offspring whose parents were 
heterozygous for one or more of the IDDM-associated INS 
variants, the IBD probabilities were significantly greater than 
50% at three of the four INS sites studied. Preferential trans- 
mission of 5' VNTR alleles was seen in meioses from parents 
who carried a single class 1 allele. Surprisingly, maternal IBD 
probabilities did not differ from 50%, whereas the paternal effect 
was significant at all INS sites characterized in the multiplex 
families (Table 3a). No preferential segregation was seen in 
nondiabetic offspring, or in offspring of reference families 
obtained from the Centre d'Etude du Polymorphisme Kumain 
(data not shown). 

As the INS association was significant only in HLA-DR4- 
positive diabetics, we further subdivided meioses by the HLA 
genotypes of the offspring. The IDDM-associated allele Pstl a 
(\ t m/Pstl) was transmitted to 38 of 50 HLA-DR4-positive 
diabetic offspring in informative male meioses (P< 0.0005), 
whereas 8 of 21 non-DR4 diabetics received this allele (Table 
3o). The effect of maternally transmitted alleles is not significant 
for any genotype. Data from other loci, including the 5' VNTR 
with alleles grouped into size classes, showed similar segregation 
patterns (Table 36). We thus decided to reanalyse data from 
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the 5th Genetic Analysis Workshop (GAW) which includes HLA 
and 5' VNTR genotypes for 94 IDDM families (data obtained 
from F. Clerget-Darpoux). When we classified the 5' VNTR 
alleles by their size, the results were remarkably similar to ours 
(Table 36). 

Previous studies of IDDM susceptibility have focused on 
irrsulin as a candidate gene because of its 0-cell -specific 
expression. We have identified all mutations in or near INS that 
could account for a contribution to diabetes susceptibility. As 
none of the mutations alter the sequence of the protein, an INS 
effect may be due to altered regulation of gene expression. 
Alternatively, the IGF2 gene, or an unidentified gene product 
from this region may be responsible for susceptibility. 

The linkage in this study was observed principally in male 
meioses. Maternal-fetal interaction or genomic imprinting could 
account for this result. We favour the latter explanation because 
of the previously documented maternal imprinting of genes in 
this region (1 lpl5) 1720 . IgJ2 is known to be imprinted in the 
mouse and the INS-IGF2 synteny is conserved between 
species . In man, loss of maternal imprinting of llp!5 may be 
implicated in the Beckwith- Wiedemann syndrome 19,20 , which 
is associated with both islet cell hyperplasia and hyper- 
insulinaemia 21,22 . Maternal imprinting could also account for 
the increased risk of disease in offspring of diabetic fathers 
compared with diabetic mothers 23 . 
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^following items were sent by first class mail with sufficient postage today, 
Feb. 5, 2007, by me, Robert McGinnis, to Mail Stop RCE, Commissioner for 
Patents P.O. Box 1450 Alexandria, VA 22313-1450. The items are for 
application 10/037, 718, art unit 1637, Examiner Horlick, K. 

1 ) Amendment/Response and RCE (26 pages) signed 

2) Enclosures: Selected pages from the McMahon, the Inventor's paper AHG98, 
Julier, Spielman, and Thomson references, some marked. Total of 15 pages. 

3) Credit card form PTO-2038 with fee for Small Entity RCE, 1 month extension, 
extra claims 1 independent, 9 dependent (1 page) signed 

4) This mailing certificate (1 page) signed 
Total pages 43 pages or sheets. 

5) Return receipt post card (signed) 
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